Supplementary material for the paper : ” Adaptive Bandits : Towards the best history - dependent strategy “
نویسندگان
چکیده
In this document, we detail further some technical proofs not covered in the paper corresponding to this supplementary material. 1 Playing against an opponent using a known model 1.1 Regret upper bounds against the best history-class-based strategy Theorem 1 In the case of a Φ-constrained opponent, using the Φ-UCB algorithm with parameter α > 1/2, we have the distribution-dependent bound: R T ≤ ∑ c∈H/Φ;E(Ic(T ))>0 ∑ a∈A;∆c(a)>0 4α log(T ) ∆c(a) + ∆c(a)cα where Ic(T ) = ∑T t=1 I[h 0}| is the number of classes that may be activated during the run. Now, in the case of an arbitrary opponent, using ΦExp3 algorithm, we have: R̃ T ≤ 3 √ 2 √ TCA log(A). Proof: Φ-UCB: The distribution-dependent bound for Φ-UCB is a direct application of the result of [2] Appearing in Proceedings of the 14 International Conference on Artificial Intelligence and Statistics (AISTATS) 2011, Fort Lauderdale, FL, USA. Volume 15 of JMLR: W&CP; 15. Copyright 2011 by the authors. for the algorithm UCB about τa(t) def = ∑t s=1 Ias=a where at is played by UCB, that states that E(τa(t)) ≤ 4α log(t) ∆c(a) + cα. Indeed, we use the fact that R Φ T = ∑ c∈H/ΦRT (c) and thus remark that when a class c is visited, then we play according to a UCB algorithm for this class. Thus, for the distribution-free bound, we have: R T = ∑
منابع مشابه
Adaptive Bandits: Towards the best history-dependent strategy
We consider multi-armed bandit games with possibly adaptive opponents. We introduce models Θ of constraints based on equivalence classes on the common history (information shared by the player and the opponent) which define two learning scenarios: (1) The opponent is constrained, i.e. he provides rewards that are stochastic functions of equivalence classes defined by some model θ∗ ∈ Θ. The regr...
متن کاملAdaptive Control Strategy for a Bilateral Tele- Surgery System Interacting with Active Soft Tissues
In this paper, the problem of control and stabilization of a bilateral tele-surgery roboticsystem in interaction with an active soft tissue is considered. To the best of the authors’ knowledge, theprevious works did not consider a realistic model for a moving soft tissue like heart tissue in beating heartsurgery. Here, a new model is proposed to indicate significant characteristics of a moving ...
متن کاملThe Simulator: Towards a Richer Understanding of Adaptive Sampling in the Moderate-Confidence Regime
In this work, we propose a novel technique for analyzing adaptive sampling called the Simulator. Our approach differs from the existing methods by considering not how much information could be gathered by any fixed sampling strategy, but how difficult it is to distinguish a good sampling strategy from a bad one given the limited amount of data collected up to any given time. This change of pers...
متن کاملSemi-analytical Solution for Time-dependent Creep Analysis of Rotating Cylinders Made of Anisotropic Exponentially Graded Material (EGM)
In the present paper, time dependent creep behavior of hollow circular rotating cylinders made of exponentially graded material (EGM) is investigated. Loading is composed of an internal pressure, a distributed temperature field due to steady state heat conduction with convective boundary condition and a centrifugal body force. All the material properties are assumed to be exponentially graded a...
متن کامل